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Abstract 

We present a very general chaining method which allows one to con- 
trol the supremum of the empirical process sup^g^ | Yli=i h 2 {Xi)— 
Eft 2 1 in rather general situations. We use this method to establish two 
main results. First, a quantitative (non asymptotic) version of the 
classical Bai-Yin Theorem on the singular values of a random matrix 
with i.i.d entries that have heavy tails, and second, a sharp estimate 
on the quadratic empirical process when H = {(t, ■) : t g T}, T C K" 
and y, is an isotropic, unconditional, log-concave measure. 

1 Introduction 

The main goal of this article is to obtain a non-asymptotic version of the 
Bai-Yin Theorem [5] on the largest and smallest singular values of certain 
random matrices. The Bai-Yin theorem asserts the following: 

Theorem 1.1 Let A = -Ajv.n be an N x n random matrix with independent 
entries, distributed according to a random variable £, for which 

E£ = 0, Ef = 1 E£ 4 < oo. 
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If N, n — > oo and the aspect ratio n/N converges to f3 £ (0, 1], then 

1 1 

-j=s niin {A) -> 1 - v 7 /?, ^=s max (A) -> 1 + v^g, 
V TV v -iV 

almost surely, where s max and s mm denote the largest and smallest singular 
value of A. 

Also, without the fourth moment assumption, s ma , x (A)/y/N is almost 
surely unbounded. 

The main result of this article is a quantitative version of the Bai-Yin The- 
orem. 



Quantitative Bai-Yin Theorem. For every q > 4 and L > 0, there 
exist constants c±, C2, C3 and C4 that depend only on q and L for which 
the following holds. For every integer n, (3 € (0,1] and N = n//3, let 
A = An^ = be an N x n random matrix with independent, symmetric 
entries, distributed according to a random variable £, satisfying E£ 2 = 1 and 
E|£| 9 < L. Then, for any n > ci, with probability at least 1 — C2/(/3n C3 ), 

1 - c 4 v^ < -j^s min (A) < -^=s max {A) < 1 + C4V 7 /?- 
V -/V v N 

The proof of this result is based on the analysis of a more general scenario 
which has been studied extensively in recent years, in which the given matrix 
has independent rows, selected according to a reasonable measure on R n , 
rather than a matrix with i.i.d. entries; and unlike the classical random ma- 
trix theory approach, one is naturally interested in the non-asymptotic be- 
havior of the largest and smallest sing ular values of V = N' 1 / 2 Y%=i( x i> ') e i 
as a function of N and n. We refer the reader to the surveys [341128] and refer- 
ences therein for the history and recent developments in the non-asymptotic 
theory of random matrices. 

We will focus on the following questions: 

Question 1.2 Let fi be a symmetric measure on M. n and let (XAf =1 be se- 
lected independently according to \i. 

1. Let Sjv = jr Y^iLi x i <£> Xi be the sample covariance matrix and S = 
E(X €3 X). Given e > 0, is it true that with high probability, if N > c(e)n 
then ||Ejy — || 2-^2 < s? 

2. If X is an isotropic vector (that is, E^X, x^ 2 = \\x\\ 2 n for every x G M. n ), 
are there "canonical" high probability bounds on s max (r) and s m i n (T)? For 
example, under what conditions on \l are s max (r) and s m ; n (r) of the order 
of 1 ± C\JnjN - like in the Bai-Yin Theorem? 
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Observe that the two questions are very similar. For example, it is 
straightforward to verify that if \x is isotropic, then both parts can be re- 
solved by estimating the supremum of the empirical process 

1 N 

-^tf-E^i} 2 . (1.1) 
i=i 

And, in view of the second part of Question 11.21 we will be especially in- 
terested in the case N ~ n, that is, while keeping the aspect ratio n/N 
constant. 

When studying measures on M n in this context, it is natural to divide 
the assumptions into two types: one on the PI norm of X and the other on 
moments of linear functionals (x, 

To formulate the moment assumption we will use here, recall that for 
a > 1, the ip a Orlicz norm of random variable Z is defined by 

\\Z\\^ a = inf {c> : Eexp(|Z| a /c a ) < 2} , 

and there are obvious extensions for < a < 1. It is standard to verify that 
for every a > 0, ||-Z||^ Q is equivalent to sup >1 \\Z\\i q /q 1 ^ a . 

Assumption 1.3 For p,q > 2, a symmetric measure [i satisfies a p-small 
diameter, L q moment assumption with constants k,\ and k<i, if a random 
vector X distributed according to /x satisfies that 

ll^llf™ < Kin 1 ^ a.s., and for every x G S n ~ l , -)||L g < ^2- (1-2) 

\i satisfies a small diameter ip a moment assumption if the ip a norm replaces 
the L q one in (|1.2p . 

One should note that with very few exceptions, both parts of Assumption 
11.31 are needed if one wishes to address Question 11.21 

The p-small diameter component, i.e. that ||A||^n < Kin 1 ^ almost 
surely, is rather standard. Although it does not hold as stated even for 
a vector with i.i.d. gaussian entries, one may assume it without loss of 
generality unless N is much larger than n. Indeed, in typical situations 
^M||X|| £ n > tn 1 ^) decays very quickly both in t and in n. Therefore, 

maxj<j\r HAil^n/re 1 ^ is bounded with very high probability, unless N is con- 
siderably larger than n (see Section [2] for more details). Hence, if N ~ n, 
which is the range we shall be interested in, a conditioning argument allows 
one to make the p-small diameter assumption. 



sup 
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Question 11.21 has been studied under the 2-small diameter assumption. 
In [27], Rudelson showed that if ||X||^n < Kiy/n almost surely then for every 
N > cinlogn, with probability at least 0.99, 

/nlogn /nlogn 

i - c z\j^r - Smin( } - w(r) - 1 + C2 V ~^h j (L3) 

and ci,C2 are constants that depend only on k\. 

It is straightforward to verify that this bound is optimal by consider- 
ing the uniform measure on the set of coordinate vectors {y/ne\, ^/ne n }, 
which results in the coupon-collector problem. Thus, given e > 0, one 
requires at least c(e)n log n random points to ensure that the sample covari- 
ance matrix e-approximates the true covariance. Of course, |27| does not 
lead to a nontrivial estimate in the second part of Question 11.21 i.e. if the 
aspect ratio n/N — > (3 G (0, 1] and n — > oo, and in particular, (|1 .3j) can not 
yield a Bai-Yin type of bound. Any hope of getting the desired bounds in 
Question (|1.2p requires additional assumptions on X. 

Turning to the moments component of Assumption 11.3} note that a 
bound on the L q moments of linear functionals means that -)\\L q 

and if, in addition, X is isotropic, the norms are equivalent. More- 
over, in a similar fashion, a ip a assumption combined with isotropicity im- 
plies that the ip a and norms are equivalent. 

Consider a situation when one only assumes such a moment condition. 
It is standard to verify that under a ^2 assumption, in which linear function- 
als exhibit a K2-subgaussian tail behavior (i.e., Pr(\(X, x)| > tfi^lMI^) < 
2 exp(— i 2 /2)), then with probability at least 1 — 2exp(— c^n), 

SminCT), SmaxOT) G [1 - C 4 ^/^/N ', 1 + C 4 y^/N] 

Indeed, a Bernstein type inequality shows that for each x G S 11 " 1 and 
< t < 1/K2, PrQN-^Zii^xf -E(X,x) 2 \ >t)< 2exp(-c 5 ^t 2 ). 
And, if one is to obtain an estimate on the empirical process (jl.ip . one has 
to control a 1/2 net on the sphere, which is of cardinality ~ exp(cgn). The 
tradeoff between the complexity of the indexing set and the concentration at 
hand shows that with the desired probability, sup tgS n-i | iV -1 ^£i(^> ~~ 
E(X,x) 2 \ < ^/N. 

Unfortunately, when one has a weaker moment estimate than a ip2 one, 
the situation becomes considerably more difficult. The complexity of the 
set one has to control remains the same, but the individual concentration 
deteriorates, because A" -1 ^2(Xi, x) does not exhibit a strong enough con- 
centration around its mean to balance the concentration-complexity tradeoff 
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at the level of \Jn/N. Therefore, with a weaker moment assumption than a 
ip2 one, a combination of individual tail bounds and a "global" assumption, 
like the small diameter information, is required in both parts of Question 
[T2l 

One situation in which the process (|l.ip has been studied extensively in 
the last 15 years is a small diameter, tp\ moment assumption. The motiva- 
tion for considering this situation comes from Asymptotic Geometric Anal- 
ysis and the theory of log-concave measures, which are measures that have 
a symmetric, log-concave density. They fit the framework at hand nicely, 
because an isotropic, log concave vector X satisfies that ||AT||^n < cin 1 ^ 
with probability at least 1 — 2 exp(— C2?i 1 / p ). Indeed, the case p = 2 was 
proved in [24] . while for p > 2 the result was recently established by Latala 
in [19] . Moreover, linear functionals exhibit a ipi behavior (see, e.g. [12] for 
a survey on log-concavity). 

Partial results in the isotropic, log-concave case have been obtain by 
Bourgain [9], yielding an estimate on the covariance operator for N = 
c(s)n log 3 n, which was improved by Rudelson [27] to N = c(e)n log 2 n. Sub- 
sequent improvements were N = c(e)n log n for unconditional convex bodies 
in [13] and for general log-concave measures in |24j . Finally, the optimal esti- 
mate of N = c{e)n was obtained for an unconditional, log-concave measures 
by Aubrun [3], and for an arbitrary log-concave measure in Adamczak et al. 
[TJ [2], where the following result was proved: 

Theorem 1.4 There exist absolute constants c\ and C2 for which the fol- 
lowing holds. If jjL is an isotropic, log-concave measure, then with probability 
at least 1 — exp(— ci^/n), 



sup 



i N I — 



Naturally, Question 11.21 becomes even harder when one assumes that 
linear functionals have heavy tails, because sums of independent random 
variable exhibit very limited concentration - far below the level required 
for the proof of Theorem 11.41 Recently, Vershynin [35] proved the following 
remarkable fact: 

Theorem 1.5 For every q > 4, S > and constants n± and ki, there exist 
constants c\ and ci that depend on q, S and k\,K2 for which the following 
holds. 
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If n satisfies a 2-small diameter, L q moment assumption with constants 
Ki and K2, then for every 5 > 0, with probability at least 1 — 5, 

n \ 1/2-2/g 

: N - S || 2^.2 < ci(loglogn) / ' 



N 



In particular, if ^ is isotropic then 



/ n \ 1/2-2/9 n / n \ 1 / 2 ~ 2 /i 

1-C2 (-) (log log nf < s min (r) < s max (r) < l+c 2 (-) (log log n) 2 . 

Moreover, very recently Strivastava and Vershynin [29], obtained the follow- 
ing result: 

Theorem 1.6 For every i] > 0, e > and k > there exists constants 
ci,C2 and C3 = 2^+2 f or w hich the following holds. Let fj, be an isotropic 
measure, satisfying that for every projection P in W 1 , 

(*) M||PX||! >t}<-^, fort>K rank(P). 

If (Xi)^L 1 are independent random vectors distributed according to \x then 
for every N ^ c\n, 

E||Ejv-7 d || ^e. 
Moreover, only under a q-moment assumption, 

i-c 2 (-^) C3 <E Smin (r) 

It should be noted that the boundedness assumption in Theorem 11.61 is 
satisfied by a vector with independent components X = (£«)f =1 , if £ G L q 
for q > 4, and thus both parts may be used in the i.i.d situation. However, 
for any 77 > 0, C3 < ^ (1/2 being the power in the Bai-Yin Theorem). 

Our main result gives a version of Theorem 11.51 for an unconditional 
measure with "heavy tails" . 

Theorem A. Let fi be an unconditional measure that satisfies the p-small 
diameter, L q moment assumption with constants K\ and K2 for some p > 2. 

1. For every q > 4 and 5 < 1/2— 1/2 (p — 1), there exist constants cq, 
c\ and C2 that depend on q, p, k±, K2 and 5, such that, for every 
n < N < exp(con <5 ), with probability at least 1 — exp(— c\n s ), 

N , 

sup|iV- 1 ^<X t ,t> 2 -E(X,t) 2 \<c 2 (l) . 
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2. For every 2 < q < 4, if p > (1 - 2/g)" 1 and 5 < 1/2 - l/2(p - 1), there 
exist constants C3 and C4 that depend on q, p, 5, k\ and K2, such that, 
for every n < N < exp(con 5 ), with probability at least 1 — exp(— csn s ), 

sup |iV- 1 V<X ij t) 2 -E<X J t> 2 |<c 4 (^) log(iV/n). 

tfc -°2 i = l 

In both cases, for every e > 0, with probability at least 1 — 2exp(— cn s ), 
||Sjv — S 1 1 2— *-2 < e provided that A 7 " >q tP: 5 :KljK2 n - Moreover, if /i is isotropic 
and q > 4, then 

/ u \ i/2 / n \ V 2 

i-c 2 (-J < s min (r) < Smax (r) < 1 + c 2 ^— J , 

and if 2 < g < 4 then 

—J log(JV/n) < s min (r) < s max (r) < l+c 4 (-J bg(JV/n). 



Our quantitative version of the Bai-Yin Theorem follows from Theorem 
A, because of the straightforward observation that if £ € L q for q > 4 
and is symmetric, then X = (£i)™ =1 is unconditional, and there is some 
p > 2 for which maxj<7\r ||-X"||^« < n l / p with high enough probability. Thus, 
conditioning [i to the unconditional body cn l l p Bp yield the desired result. 

The approach we take in the proof of Theorem A is very different from 
all the previous results mentioned above, as those rely heavily on the fact 
that the empirical process (jl.ip is indexed by the sphere or by the Euclidean 
ball, and that the underlying class of functions consists of linear function- 
als. At the heart of the arguments are either the classical trace method 
[4], a non-commutative Khintchine inequality |27| or sharp estimates on 
maxj/| =fc || J2iei XiWtg [SHI [35]. ^ s sucn i an these proofs are "Euclidean" 
in nature and can not lead to bounds on the empirical process 



sup 



(1.4) 



for an arbitrary class of functions H - not even for Hf = {(t, •) : t G T} 
when T is not the sphere or close to the sphere in some sense. 

One should note that process (jl.4p is an interesting object in its own 
right. For example, it has a key role in analyzing the uniform central limit 
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Theorem [ID]; and, when indexed by Ht for T C W 1 , it appear naturally 
in Asymptotic Geometric Analysis, for example, when proving embedding 
results or "low-M*" estimates for various matrix ensembles (see |22j for a 
more detailed discussion). Thus, understanding what governs (jl.4p . and in 
particular, going beyond the case H^n is rather important. 

The proof of Theorem A does just that, since it is based on a bound 
on (ll.4p in terms of a certain notion of "complexity" of the class H. It is 
not tailored to the case Hb» , nor does it relay on the fact that the indexing 
class consists of linear functionals. Rather, the proof is based on a chain- 
ing scheme which is much more general than the applications that will be 
presented here. 

The second application we chose to present as an illustration of the 
potential this empirical processes based method has, is the following. 

Let yi, ...y n be independent, standard exponential random variables (i.e., 
with density ~ exp(— \/2|i|), and for every T C W 1 set 

n 

E(T) =EsupV^iyi, d 2 (T) = sup||t|U. 

Theorem B. There exists absolute constants ci, C2 and C3 for which the 
following holds. If [i is an isotropic, unconditional, log-concave measure 
on W 1 and T C M n is centrally symmetric, then for every u > c\, with 
probability at least 1 — 2 exp(— C2U 2 ), 

1 N 

sup ^E(*' Xi > -H*ll| 
teJ i=i 



To put Theorem B in the right context, recall that a symmetric measure 
v on M n (k, L)-weakly dominates a symmetric measure ji if for every x £ M n , 
and every t > 0, Pr^(\(x, -)\ > Lt) < kPv u (\(x, -)| > t) [16] . For example, 
if [j, is an isotropic L-subgaussian measure and G = (g\, g n ) is a standard 
gaussian vector in R n then 

Pr„{\(x, -)| > Lt) < 2exp(-t 2 /2||x|||) = Pr G (|(x,->| > t), 

and thus /U is weakly dominated by G. 

By the Majorizing Measures Theorem (see, e.g., [32] and Section [2]), it 
follows that if \i is L-subgaussian, there is a constant c = c(L) satisfying 
that for every T C M n and every integer N, 

N N 

Esup(VXi,t) < cEsup<VG'i,t> = cVNG(T) (1.6) 
ieT i=i ieT i=i 
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where (Xi)fL. 1 are independent copies of X, {Gi)f =l are independent copies 
of G and G(T) = E sup tgT (G, t) . 

Moreover, the results of |2H [22] show that if T is centrally symmetric 
and fj, is isotropic and L-subgaussian, then 



Hence, the fact that an L-subgaussian measure is weakly dominated by a 
gaussian measure (with the same covariance structure) is exhibited by a 
strong domination in (11 .6f) and in (11.7p . that holds for every T C M. n . 

Just like subgaussian vectors, isotropic, unconditional log-concave vec- 
tors have a natural weakly dominating measure. By the Bobkov-Nazarov 
Theorem [7] they are (k, L)-weakly dominated by the vector Y = (j/i, y n ), 
and k and L are absolute constants. In [18], Latala showed that as in (|1.6p . 



for every T C M n , Esup teT (E? =1 Xi,t) < E sup teT (£i =1 Y u t). Theorem 



B shows that the quadratic strong domination, analogous to ()1.T|) . is also 
true in this case. 

Theorem B has many standard applications, leading to embedding re- 
sults of a similar nature to the Johnson-Lindenstrauss Lemma and to "low 
M*" estimates that hold for unconditional, log-concave ensembles. Deriving 
these and other outcomes from Theorem B is standard and will not be pre- 
sented here. One should also note that a log-concave Chevet type inequality, 
i.e., upper estimates on the operator norm ||r||x->y for finite dimensional 
normed spaces X and Y has recently been established in [3]. 

In the next section we will present several preliminary facts and defini- 
tions that will be used throughout this article. Then, in Section [3] we will 
show that if V C can be decomposed in a certain way, the Bernoulli 
process indexed by {(vf)fL 1 : v € V} is well behaved. Section H] is de- 
voted to the observation that if H is a class of functions, then under mild 
assumptions and with high probability, the random coordinate projection 
P a H = {{h(Xi))f =l : h S H} can be decomposed in the sense of Section [3] 
It turns out that the decomposition depends on the complexity of H and 
on the decay of tails of functions in H. Finally, in Section [5] we will present 
examples in which the complexity of H can be estimated, leading to the 
proofs of Theorem A (and consequently, the quantitative Bai-Yin Theorem) 
and of Theorem B. 




(1.7) 
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2 Preliminaries 



Throughout, all absolute constants are positive numbers, denoted by c, Co, c%, . 
and their value may change from line to line. Kq,K\,... denote constants 
whose value will remain unchanged. By A ~ B we mean that there are 
absolute constants c and C such that cB < A < CB, and by A < B that 
A < CB. A ~ 7 B (resp. A < 7 B) denotes that the constants depend only 
on 7. 

For 1 < p < 00, £p is M n endowed with the £ p norm, which we denote by 

\\fn, and BV: is its unit ball. With a minor abuse of notation we write I 

1 1 1 1 t-p j p 11 

both for the cardinality of a set and for the absolute value. Finally, if (a n ) 
is a sequence, let (a*) be a non-increasing rearrangement of (|a n |). 

Next, let us turn to the complexity parameters that motivated our 
method of analysis - Talagrand's 7-functionals. 

Definition 2.1 ]32§ For a metric space (T,d), an admissible sequence of 
T is a collection of subsets ofT, {T s : s > 0}, such that for every s > 1, 
\T S \ < 2 2S and \Tq\ = 1. For (3 > 1, define the 7^ functional by 



where the infimum is taken with respect to all admissible sequences of T. 
For an admissible sequence (T s ) s >o we denote by ir s t a nearest point to t in 
T s with respect to the metric d. 

One should note that our chaining approach is based on a slightly less re- 
strictive definition, giving one more freedom; for example, the cardinality of 
the sets will not necessarily be 2 2S , the metric may change with s, etc. (see 
Section [3]) . 

When considered for a set T C L2, 72 has close connections with proper- 
ties of the canonical gaussian process indexed by T, and we refer the reader 
to |10| 132] for detailed expositions on these connections. One can show that 
under mild measur ability assumptions, if {Gt ■ t G T} is a centered gaussian 
process indexed by a set T, then 



where c\ and C2 are absolute constants and for every s, t e T, d 2 (s,t) = 
E|G S — Gt\ 2 - The upper bound is due to Fernique [11] and the lower bound is 
Talagrand's Majorizing Measures Theorem [30] . Note that if T C W 1 , (gi)™ =1 



00 




ci 72 (T,ef) < EsupG t < c 2 j2{T,d) 



10 



are standard, independent gaussians and Gt = Yl?=i9i^i then d(s,t) = \\s — 
t\\^, and therefore 

n 

ci72(r, || • |U) < EsupV^j < c 2 7 2 (T, || • |U). (2.1) 

A part of our discussion (Theorem B) will be devoted to isotropic, log- 
concave measures on W 1 . 

Definition 2.2 A symmetric probability measure fi on M. n is isotropic if for 
every y £ W 1 , f \(x, y)\ 2 dfj,(x) = \\y\\fn . 

The measure fi is log-concave if for every < A < 1 and every nonempty 
Borel measurable sets A,B C R n , fi(\A+(l - X)B) > fi(A) x ^i(B) 1 - x . 

A typical example of a log-concave measure on M. n is the volume measure of a 
convex body in W n , a fact that follows from the Brunn-Minkowski inequality 
(see, e.g. [26] ) . Moreover, Borell's inequality [8], [23] implies that there is an 
absolute constant c such that if fj, is an isotropic, log-concave measure on 
W 1 , then for every x € M n , \\{x, •)||^, 1 < c||(x, -)||l 2 = c||x||^». 

As mentioned in the introduction, if X is distributed according to an 
isotropic, log-concave measure on M n then ||A"||^n decays quickly at scales 
that are larger than n 1//p . Thus, by conditioning, the main result in |24| 
shows that a 2-small diameter assumption can be made without loss of 
generality as long as N < exp(c- v /n), and Latala [19] proved the analogous 
result for p > 2, as long as N < exp(cn 1 / p ). 

3 Decomposition of sets 

We begin with a description of the modified chaining procedure. Let (/7 s )s>o 
be an increasing sequence which satisfies that for every s > 0, 2 7?a • 2 r,s+1 < 
10 • 2 r?s+2 and for s > 1, 1.1 < r/ s+ i/r] s < 10 (where 1.1 can be replaced by 
1 + e and 10 can be any suitably large constant). For example, rjo = and 
ij s = 2 s for s > 1 is the usual choice of a sequence that has been used in the 
definition of Talagrand's 7 functionals. An admissible sequence of V C 
relative to (?? s )s>o is a collection of subsets V s C V for which \ V S \ <2 Va . For 
every s let ir s : V — > V s , which usually will be a nearest point map relative 
to some distance. We will denote tt s v — it s -iv by A s v, and sometimes write 
Aqv for ttqv. Finally, A S V is the set {A s v : v E V}. 

Let <p be an increasing function which will be chosen according to ad- 
ditional information one will have on the given class. Examples that one 
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should have in mind are (f>p(x) ~g \fx log 1 ^ (eN/x), resulting from a bound 
on the ipp diameter of H, or 9i£ N^ +e ^ q x 1/2 ~^ +e ^ q for q > 2 and e 
in the right range, arising from an L q moment assumption. 

Assume that V C 1^ is endowed with a family of functionals 9 S and 
a semi-norm || || (which, in our applications, will either arise from the L q 
norm or from the ijj/3 norm), and set d = swp v& y \\v\\. 

Definition 3.1 V C M. N admits a decomposition with constants a and 7 
if it has an admissible sequence (V s ) s >o relative to (r/ s ) s >o for which the 
following holds. 

1. sup„ ey (9o(ir v) + E s >cA( A s u )) < 7. 

2. For every v £ V and every I C {1, ...,N}, 

1/2 

<a( 7 + #(|/|)). 




3- If n s < N then for every v G V and every I C {1, N} 

(l>^ 2 ) ^ a v>( A > v ) + ii A ^ii^(i J i)) > 

and if ij s > N then for every v £ V and every I C {1, ...,N}, 
(E( A ^)*) ^ <xO,(A a v). 

Although this definition seems artificial at first glance, we will show 
that it captures the geometry of a typical coordinate projection P a H = 
MXi))^ :h£H}. 

The main observation of this section is that one can use this type of 
decomposition to bound the supremum of the Bernoulli process indexed by 
V 2 = {(vf )^L 1 : v £ V}. Hence, if V = P a H, then a standard symmetriza- 
tion argument leads to the desired bound on sup h£H jiV" 1 J2iLi h 2 (X-i) — 
Kh 2 \ (see section I5T31) . 

To formulate the estimate on the Bernoulli process, set 
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for r] s < N, put 



A 1 = sup ^2 (/>(rf s )\\A s v\\, A 2 = sup ^ (j) 2 (r] s )\\A s v\\, 



{s>0:r) s <N} 



{s>0:?? s <Af} 



and let 



A$ = sup V $ s r?y 2 ||A s 



{s: Vs <N} 

For 2 < g < 4 and < e < (q/2) - 1, let 



u i. 



{s: Vs <N} 

As will become clearer, the most important of the B q ^ e parameters is 
54 = J3 4i o = sup ^2 r)l /2 \\A s v\\, 

{s:rj s <N} 

which, under the standard choice of rjo = and r) s = 2 s for s > 1, corre- 
sponds to 72 (V, || ||). 

Theorem 3.2 There exist absolute constants cq, c\ and c 2 for which the 
following holds. If V C M. N has a decomposition as in Definition \3.1[ then 
for every r > cq, with probability at least 1 — 2 exp(— cir 2 rjo), 



sup 



N 

£ 

i=l 



< c 2 ra 2 (7(7 + #(iV) + Ai) + d (A 2 + A$)) 



Before presenting the proof, let us consider the two main examples which 
will interest us, namely, the families <\>$ = \fx log 1 ^ (eN/x) for any (3 > 



and (j)q jE = y/x{N / x)( 1+£ ^ q for any q > 2 (and for e selected appropriately). 

In both cases 4>(N) ~ y/N and for any (3 > 0, <3? ~^ a//V. If g > 4 and 
< e < q/4 - 1, $ < (1 - 4(1 + e)/g)~ 1/2 \/iV, and since $ s < then for 
P > or q > 4, 

A$<$sup ^ ?7 S 1/2 ||A^|| < ViVB 4 , 
u6V {^ s <iV} 

with the constant depending either on j3 or on q and e as above. 
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On the other hand, if 2 < q < 4 and < e < q/2 — 1 then 



i=l 



/ \ 8=1 



< ^ . .,1/2-2(1+*)/, 

~l-2(l + e)/g ls 
Therefore, in that range 

*~ 1-2(1 + e)/<z ^y^^/ 1-2(1 + g * £ 

Next, since (r/ s ) s >o increases exponentially, then for q > 2 

£ M«)||A s t;||<2d £ ^^dVN, (3.1) 



and the constant in (13. If) depends on j3 or on q and e respectively. In 
particular, if 2 < q < 4 and 0<e<g/2 — 1, then 



£ ^ a) ~l-2(l + e )/ g 

Finally, one has to control X^{s-jj s <Ar} 2 fas) II A s v||. Note that if /3 > 
or g > 4, then 

£ ^fe)||A sH | J max • ^ ^HA^II 

{s:r, s <7V} \{-^<^} ^ / {s: Vs <N} 

<y/N ]T ril /2 \\&,v\\~>/NB4, 

{s: Vs <N} 

and if 2 < q < 4 and < e < q/2 - 1 then 

<f> 2 (Vs)\\A s v\\<N 2 ^»« vl- 2{1+£)/q \\&sv\\=N^)/ qBq ^ 

{s:ri a <N} {s:r) s <N} 

We thus arrive to a more compact formulation of Theorem 13.21 in the cases 
we will be interested in. 
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Corollary 3.3 For any (3 > or q > 4, with probability at least 1 — 
2exp(-cir 2 7/ ), 



sup 



N 

E 

i=l 



< ra 2 (V + d\/iV( 7 + B 4 ) 



with a constant that depends on (3 or on q and £ respectively. 

Also, if 2 < q < 4 and < e < q/2 — 1, then with probability at least 
1 - 2exp(-cir 2 7/ ), 



sup 



N 

E 

i=l 



< 



a r 



l-2(l + e)/g 



7 2 + dV^y + dN 2( - 1+e ViB q , ( 



Proof of Theorem 13. 2L For every A s v let i be the largest integer in 
{l,...,iV} for which 6 s (A s v) > \\A s v\\(p(i). Throughout the proof we will 
assume that such an integer exists, and if it does not, the necessary modi- 
fications to the proof are obvious. Let i S)V = max{i,r/ s } and put I S)V to be 
the set of the largest i S;V coordinates of |A s u|. Let Afv = Pj sv A s v and 
Ajv = Pic A s v be the projections of A s v onto the set of coordinates I S)V 
and Ig V respectively. Also, let j be the largest integer in {1, N} for which 

7 > d(f)(j). Thus, for every v € V, (Si=i( u i)*) — 2«7 anc ^ f° r ever y 
^ > j, v\ < 2ad(f)(£)/Vl. If J is the set of the largest j coordinates of v £ V, 
let v + = Pjv and v~ = Pj^v. 

Let w ■ v = ^2^=1 WiViei, and since 

v 2 - (ir v) 2 = ^(tTsv) 2 - (tt s -iv) 2 = ^(A s u) • (tt s v + 7r s _ii;), 



s>0 



s>0 



one has to control increments of the form E^ =1 £i(A s v )i(vr s u + 7T s _if ) 
Observe that if rj s > N then with probability 1, 



N 



^£i((A s u) • (tT s V + 7T s _iv))i < \\(A S V) ■ (TT S V + 7T s _iv)||^JV 



1=1 



<2||A s v|| £ jv sup |H| £ at < 2a 2 6 s (A s v)(j + #(iV)). 



Next, if rj s < N we will decompose the vectors one has to control according 
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to the size of their coordinates, because, with probability 1 — 2exp(— r 2 /2), 

N 



E ei(A s v)i(-7T s v + n s -iv) 



i=i 



< ||(A+r;).(7r s u + 7r a _iT;)||^ 



+r\\(Ajv) ■ ((tt s v) + + (tt s _ iW ) + )||^ + r\\(Ajv) ■ ((ir s v)- + (ir^v)-)^* . 

(3.2) 

Consider the following two cases. If i StV = rj s then 

\\(A+v) ■ (tt s v + 7r s _i«)||^ <||A+v||^||Pj. i1( (7r s « + 7r s -iv)\\ £ N 

^IIA^H^) ( 7 + #(%)). 

Moreover, 

ma I ll A >|U „ 0(77 B ) 

ll A ^lk<^v^^ 2a||AsU|1 ^' 

and thus, for every v € V, 

i7y 2 ||(A s -«) • w + \\q < ^IIA.-^ll^lh+H^ < q2 70(^)11^11. 

1 /2 

To estimate t/ s ||(A~u)-w _ ||^, observe that since (Ajv)* < Q || A s v\\(f)(ri s + 
i)/\frjT+i, w* < a dcp(i)/y/i and X] < Y, a ib*, then 



^IKA^^-II^^^IIA^Hdj £ 
< a2 d% 1/2 *-l|A 



i=i 4 ^ + i 



Therefore, summing the three terms over {s > : r] s < N}, 
E ||(A+u)-(7r s t; + 7r._it;)|| /? r 

{s>0:ri a <N} 

<^7 E ttVs)\\& s v\\ + d E 4> 2 {Vs)\\A s vl 

{s>0:r] a <N} {s>0:ri s <N} 

E i ] 1 J 2 \\(A-v)-((K s v) + + (n s _ 1 v) + )\\^ < q2 7 E *fa.)IIA-i 

{s>0:»7 s <7V} {s>0:r; s <Ar} 

and 

E % 1/2 |l(A^)-((vr sV )- + (vr s „ 1? ;)-)||^ < q2 d £ ^-H^ 

{s>0:r; s <iV} {s>0:r) s <N} 



16 



Next, if i s>v 7^ then ||A+u||£w < 2a9 s (A s v), and thus 

||(A+«) • (tt s v + 7r s _^)||^ < 2a 2 9 s (A s v)( 1 + #(iV)). 
Since \I 8>V \ > rj s , 

\\A-v\\ e , < 2a6 s (A s v)/\I s , v \ 1/2 < 2a^|^, 

then splitting each w G V to + w; _ as above, 

r?y 2 1| (Aji;) -w+Wq < ^IIAr^ll^ll^+H^ < q2 t^(A^), 

and 

vl /2 \\(Kv) • w-\\t» <a* driy 2 *.\M\. 

Therefore, 

(*) = E ||(A+«) • (n s v + 7T s ^v)\\ e , + ^\\(Ajvy((n a v) + + (tT,-!^ 

{s>0:»; s <iV} 

+vl /2 \\(Kv) • + (7r s _^)-)||^ < (3) + (4) + (5), 

where 

{s>0:»? s <Af} {s>0:f? s <7V} 

and 

(5)< q2 ^ E % 1/2 ^l|A.u||. 

{s>0:r; a <Af} 

Recall that \A S V\, \V S \ < 10 • 2 r >"+ 1 and that ?? s+ i < 10r/ s . Given r > c , 
then applying (I3.2D for i s = 10rr/ s and summing over {s : rj s < iV}, it fol- 
lows that sup„ g y YaLi £ i{ y2 ~ ( 7r o' y ) 2 )i is bounded by the desired quantity 
with probability at least 1 — 2exp(— cir 2 r/o). 

Finally, for u S Vo, let i be the largest integer in {1,...,N} for which 
8o(v) > ||v||^(i), and set / to be the set of the i-largest coordinates of v. 
Thus, Ei G /^ 2 ^ 2a 2 6» (v) < 2a 2 7 2 , and for £>%, v\ < a\\v\\<f>{£)/y/£. Since 
| y | < 2 V0 , then with probability at least 1 — 2exp(— C2T 2 t]q) 



N 

£ 



1 /9 

< a 2 f + rd$ r] Q / ||v||, 



completing the proof. 
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4 Coordinate projections of Function classes 

The aim of this section is to show that under very mild assumptions, em- 
pirical processes have well behaved coordinate projections in the sense of 
Definition 13.11 A first result in this direction was established in [22], in 
which the main observation, formulated in the language of Section [3l was 
that if 770 = and rj s = 2 s for s > 1, then for the choice of s ((M-^i))£Li) = 



2 s l 2 \\h\\ i , 2 , a = yfti, = INk and^>(x) ~ y/Z\ag(eN/x), the set 



^ = {{h(Xi))iLi '■ h G H} has a good decomposition with high probability. 
Hence, the Bernoulli process indexed by V 2 satisfies the following: 

Theorem 4.1 There exist absolute constants c\, C2 and C3 for which the 
following holds. If H is a class of functions, then for every r,u > c\, with 
fi N -probability at least 1 — 2exp(— c-iu), V = P a H satisfies that 



with probability at least 1 — 2exp(— c^r 2 ) with respect to the Bernoulli random 
variables. 

Theorem l4.1l is rather restricted because the V^-based complexity param- 
eter seems too strong in many situations, as does the assumption that H is 
a bounded subset of . Here, we will try to impose as few assumptions as 
possible on H. 

Let H be a class of functions on (Cl,fj,). For every u > we will 
define three events in the product space Q N , which will be denoted by 
&i,u, ^2,u and ^3,u- On the event f2i jU n ^2,u H ^3, u , the random set 
P a H = {(h(Xi))^ 1 : h G H} will be well behaved for the right choice 
of functionals 6 S and <j>. We will then study cases in which the event 
f2l )U Pi H.2 )U n ^3 jU has high probability. 

Definition 4.2 For (ij s ) s >q as above, set sq > to be the first integer for 
which rj s > log(eiV). 

For every s G {s : log(eiV) < rj s < N}, let i s be the largest integer in 
{1, ...,N} for which r/ s > i\og(eN/l), and ifrj s < \og(eN), set £ s = 1. 

The motivation for this definition is the following. If is the collection 
of subsets of {1, N} of cardinality k, sq is the level above which one may 
find k for which the cardinalities \Ek\ and \H S \ are comparable. Indeed, 
when s < sq, \E\\ can be significantly larger than \H S \ = 2 r,s , but when 
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s > sq, logl-ffgl and logl-E^J are of the same order, and thus one may 
simultaneously control every function in H s and every subset in Ei s at no 
extra price. The main idea of the proofs in this section is to try and balance 
these two quantities as much as possible. 

Observe that since (%)^ =0 g rows exponentially, so does (£ s )s>s Q - 

Definition 4.3 For an admissible sequence (H s ) s >o and a sequence of Junc- 
tionals {6 u ,s)s>so> let ^ e th> e event for which, for every h £ H, the 
following holds: 

1. for every log(eiV) < n s < N , (^tT ((&sh) 2 (Xi))*) ^ < e u , s (A s h), 

(and if the u£ s +i > N then the sum terminates at N ). 

1 /2 

2. for every Vs > N, (^((A^) 2 ^*))*) < 9 u , s (A s h). 

s. pr(K^) 2 (^))f /2 <^oKM. 

The set f2i iU is the subset of Q N in which the functionals 9 UjS yield a good 
bound on the £2 norm of the "relatively large" coordinates of each increment 
when s > sq. In contrast, on the set $72, u the smaller coordinates will be 
controlled for s > sq. One of the key points of the proof is finding an 
estimate on the norm on these coordinates, but doing so without any 
real concentration phenomenon for sums of i.i.d. random variables coming 
to one's aid. 

Formally, to define the set £l2,u, first fix a random variable Y, an integer 
N and e > 0. For every j < N let Sj = (j/eN)( 1+e \ set 

Vj = ini{y : Pr(\Y\ > Vj ) < S d }, 

and without loss of generality, we will assume that the infimum is attained. 
For every 1 < k < N, let 

f u (Y,k) = K 3 v^[ E vy* 

\{j:2i<[k/u]} 

where K3 is a suitable chosen absolute constant. 

The motivation for this definition is the following observation, showing 
that with high probability, the "tail" of a sum of i.i.d random variables can 
be controlled using /. 
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Lemma 4.4 There exist absolute constants c\ and c<i for which the fol- 
lowing holds. For every integer I and u > c\je, with probability at least 
1 — 2exp(— C2ue£log(eN / £)) , for every integer k > ul, 

k 

E (Y?r<fu(Y,k) 

i=ue+i 

Proof. Since Pr(\Y\ > yj) < Sj = (j/eN) 1+£ then for u > 1, 

PrOGj > W ) <QW < exp( U jlog(eiV/uj) - (1 + e)«j log(eiV/j)) 

< exp(-euj log(eJV/j')). 

Thus, summing over {j = + 2 l /it] : 2* < — it follows that with 
probability at least 1 — exp(— c\eul\og(eN / 1)) , if 2 % < k — ul then Y* £+2t < 
y\e+2y u y Therefore, 



E o$r z E 2 ^ + 2^r < E 2, 4.- v « 

j=u£+l {i:2 i <k-ui} {i:2 l <k-u£} 

<c 2 u E 2'^ = / 2 (Y,£ ; ), 

{j:2J<A;/u} 

where the last inequality is evident by a change of variables. ■ 
We will also need the following "global" counterpart of the functional /. 



Definition 4.5 Given a class of functions H , an integer N and e > 0, set 
Zj = inf{z : supPr(|h| > Zj) < {j/eN) 1+£ }. 

heH 

For every k < N and u > 1, let 

F u {k) = k 3 v^ I E 2h 

\{j:2J<k/u} 

Clearly, for every h £ H and every k, f u (h, k) < F u {k). 



i r 2 



Definition 4.6 Let £li,u be the event on which, for every h E H , every 
s > sq md every j > ut s 
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/ \ 1/2 

1- (EL^+i((A^) 2 (XO)*) <f u (A s h,j), 

/ \ 1/2 

2- (ELuts+i^sh) 2 ^))*) < F u (j). 

The final set, is very close in nature to Q-2,u- It is needed to control 
the coordinates of "very small" increments - when s < sq, if such an integer 
exists. 

Definition 4.7 If t]q < log(eN), let ^ls^ u be the event on which for every 
h £ H, every < s < sq and 1 < j < N, 

fi fi \ 1/2 

( ^((A^) 2 ^))*) < f u (A s h,j), \y2((ir So h) 2 (X t )yj < F u (j). 

//770 > log(eiV) set n 3>u = Q N . 

It turns out that on the event tt± jU fl &.2,u H ^3 «, the set P a H is indeed 
well behaved. Let 

7„ = inf sup V] 9 U)S (A s h), (4.1) 

with the infimum is taken with respect to all (?7 s )-admissible sequences. 
From here on we will assume that (H s ) s >q is an almost optimal (77s) s >o _ 
admissible sequence. 

Lemma 4.8 There exists absolute constants c\ and C2 for which the fol- 
lowing holds. Let (6 u ,s)s>s be functionals, and for s < sq set 9 UyS = 0. 
For every u > c\, on the event Q± :U n f^2,u H ^3,^, for every h G H and 
IC{1,...,N}, 

1- if Vs < N then 

^g(A s /i) 2 pQ)^) < 9 u , s (A s h) + f u (A s h, \I\), 
and if n s > N, 

1/2 



^2(A s h) 2 (X. t ) < 9 UjS (A s h). 
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(E^Pq) < lu + E F{o2u2 l ) + R S0 {hJ\ 

Viel / {i:2«<|7|} 

where R ao ,i(h) = O ufi (Tr h) ifs = andR SQ (h,I) = min{0 U)SO (Tr so h), F U (\I\)} 
otherwise. 

Proof. First, assume that log(eiV) < r/ s < N (i.e. s > sq) and recall that 
£ s is the largest integer for which rj s > £\og{eN/£). If |/| < ui s then the 
claim follows from the definition of 9 US and the set fii u . If |/| > u£ s , then 

/ \V2 ,ut s \l/2 / |/| \V2 

Vie/ / \t=l / \i=u^+l 

and the claim is evident from the definition of the function f u and the set 

If, on the other hand, r) s < log(eiV) then so > and the assertion follows 

from the definition of ^3^. 

The second part of (1) follows from the definition of 
Turning to (2), we shall treat two cases. First, consider the case |/| > 

u£ SQ and observe that it suffices to estimate (Yl'i=i~ 1 ((' 7r sh) 2 (Xi))*) 1 / 2 . 

deed, let s be an integer for which u£ s < \I\ < u£ s+ \. Since £ s+ \ is nonde- 

creasing, then on f2i jU , 

'u£ s+1 \ I ui a+1 \ I ui a+1 

E ^ E E ((a^) 2 (x,))* + E (Wwr 

i=l / j>s+l \ i=l J \ i=l 

< e + E (W) 3 (*or 

If J C / is the set of the largest u£ s coordinates of ((vr s /i)(Xj))^ 1 in /, 
then the coordinate projections satisfy that 

P / ((7r s M(X0)^ 1 =Pj((7r s _ 1 / l )(X0)iI 1 +P J ((A s / l )(X0)iI 1 +P 7V (( 7 r s / l )(X i ))iI 1 
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and thus, 



\ 1/2 /u£ s+1 

1/2 / \ 1/2 



< max 
|/i|=«4 



1/2 



53 ((vr^) 2 ^))* 

i=«4+l 

Hence, if we set Uj jS (h) = max|j| =tt ^ +1 (^iei"( 7Fs ^) 2 (^)) then for every 
s, and every h £ H 



1/2 



W s , s (/i) <W s _i >s _i(/i) + max V(A s /i) 2 pQ) 

|/i|=«^ 1 



+ 53 (M) a (**)V 

\i=u£ s +l 

<W,_i,._ 1 (/i) + e UiS (A s h) + F u (u4+i). 
Summing over all s > so, 

s s+1 

Z4, s (^)< 53 e uA A J h )+ 53 Fuiul^+Us^h), 

j=s +l j=so+l 

and thus, for every h £ H and every I C {1, iV}, 

(53 /i 2 ^)") < 53 ^.(Ajh) + J] F u « + i)+W S0)S0 (h). 
Viei" / s>s {s>s -.e B <\i\} 

Next, one has to bound sup^gij max|/|< M * +1 (X^e/C^o 

/i) 2 ^)) 172 . This 

is at most U)SO {-K so h) on and when sq > 0, it is also bounded by 
F u (u£ S0 ) < Fu(\I\) on fi 3)U . 

The claim in this case follows since £ s grows exponentially for s > sq, 
and thus 

53 F u (u£ s+1 )< 53 F u (cu2 l ) 
{s>s Q -.e s <\I\} {i&<\i\} 
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for a suitable absolute constant c. 

Turning to the second case, if |/| < u£ SQ , note that 

/ \i/2 / \l/2 / vl/2 

X> 2 m ^ E E( A ^) 2 (^) + E(^o^) 2 (^) 

Vie/ / s>«0 ViG/ /Vie/ / 

< E UiS (A s h)+mm{9 UtSO (TT so h),F u (\I\)}. 



s>s 



For Lemma 14.81 to have any meaning, one has to identify the functionals 
f u , F u and U)S in the cases one is interested in. Our next goal is to study 
the functions f u and F u under various tail assumptions on functions in H, 
and naturally, the two families of tail estimates we will be interested in are 
when H has a bounded diameter in L^ fj or in L q for q > 2. 

If H C L # , then for every h € H, Pr(\h\ > y) < exp(-(y/||/i||^). 
Thus, for e > 1 and every j, 

Vj < e\\h\\^ )ogW(eN/j), Zj < e sup \\h\\^ ]ogW (eN/j). 

hen 

Hence, if d^ = sup heH \\h\\^ , then 

E 2 '4 J ^ I E 2J log 2//3 (eiV/2^ 

ey/ud^ p yfi\og l ' p {eN/i) ~g ey/ud^, p (f)p(i), 

and in a similar fashion, 

/u(M) £\/w||/i|| 1 /, /3 \/i'log l//3 (eiV/i) ~£ ev^H/iH^^W- 

Using the same argument, if h £ L g then Pr(|/i| > ||/i||z, y) < and 
for any < e < q/2 - 1, Vj = \\h\\ Lq (N/fiW*. If sup h€H \\h\\ Lq = d Lq , 
q > 2 and c gi£ = 1 — 2(1 +e)/q then 

i'i • \ 1/2 

lo S2 4 \ //v\( 1 + £ )/9 

x - ^ ..■,.,,,,m(u c i/« 1 ^ _1 i— . rz l -tv \ 



F u (i) <V^d Lq E 2 J (iV/2^) 2 ( 1+£ )/« < ^v^£ 9 >A [-) 



J CqlVud Lq (t) qy£ { 
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and 

fu(h,i) < C~ly/u\\h\\ Lq Vi 1 — j ~ C~ly/u\\h\\L q <f>q, e (i). 

Combining these observations with the estimates of Lemma 14.81 and noting 
that if so > then R SQ j(h) < y/udi, cj) qte (\I\), one reaches the following 
corollary. 

Corollary 4.9 Let (O u ,s)s>s be a sequence of junctionals and for s < so let 
6u,s = 0. If H is bounded in L q for q > 2, then on f2i )U n Sl2,u H ^3,u, /or 
euery h £ H and every I C {1, iV} 

ifVs<N, 

^g(A s /i) 2 (X^ < 6 UyS (A s h) + c-^\\A s h\\L q <t> q> e{\I\l 
and ifr] s > N then 

1/2 



Kiel J 



2 ' 1/2 

(5> 2 (Xi) ) < ^^, s (A s / i ) + c-^^ 9 ^, £ (|I|). 
^4 similar bound holds when H is bounded in L<^ . 

5 Estimates on Q iiU and the choice of functionals 

We will begin by showing that £l2,u is a large set, almost regardless of any 
assumptions on cj>, an observation that is based on the same idea as Lemma 

ED 

Lemma 5.1 There exist absolute constants c\ and c% such that, for every 
e > and u > c\/e, Pr(^2,«) > 1 — 2 exp(— C2£wq so ). 

Proof. Recall that by Lemma 14.41 f° r an Y random variable Y, with prob- 
ability at least 1 — 2exp(— ciue£log(eN/£)), for every integer k > u£, 

k 

£ (Y?T < fu(Y,k). (5.1) 

i=ul+l 
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Let £ = £ so , and since n s ~ £ s \og(eN / £ s ) and |A s i7| < 2 Vs , then for u > 
cs/e, (|5.ip holds uniformly for every /i G A S H with probability at least 
1 — 2 exp(— c^u£7] s ). The analogous claim holds for functions in H s as well, 
with the uniform bound of F u replacing f u . Summing over all s > so and 
since (rj s ) grows exponentially, the claim follows. ■ 

Since 0*2 u is always large, and since will behave in a very similar way 
when so > 0, the crucial point in the construction of a good decomposition 
is a correct choice of 9 U s and estimates on £l\ u . 

The functionals 9 U)S capture the geometry of H, and thus have to be 
selected according to the information one has on the class. We will present 
two examples of such choices, each leading to one of our two main results. 
The first one will be based on "global" structure like metric entropy, while 
the second uses accurate estimates on each "chain" . 

5.1 The ball — global estimates 

Let /j, be an unconditional measure on R n , set H = {{t, •) : t G B%} to be 
a class of linear functionals on (E. n ,fi) - and from here on we will identify 
the class {<£,•) : t G T} with its indexing set T. We will also assume that 
fj, satisfies the p-small diameter, L q moment assumption for some p > 2 
and q > 2; that is, p, is supported in Kin V p B2, and for every x G R , 

||(av)IU g < «2|kll^- 

Let K4 > 10 be an absolute constant to be fixed later, set 2 S1 ~ n s for 
5 < 1/2 - l/2(p - 1), and put 

Vs = K4 2 s+Sl max{log(en/2 s+Sl ), 1}. 

Note that sq = as long as 770 ~ 2 Sl log(en/2 Sl ) > log(eiV), i.e., ifn 5 log(n) > 
log(eA r ) - which we will assume is the case, since our main interest in when 
N ~ n. 

If X = (x\, ...,x n ) is distributed according to (i then for every 1 < 
£ < n, set M e = ||(Ei =1 (xf )*) 1/2 ||l 00 . Define the following functionals 
(which, in this case, will be constants depending only on u and s): let 9 Uj q = 
cv^yV/^^ 1 ^ 1 / 2 - 1 /^, if 2 S+Sl < n, set 6 U)S = c^u^n^n^+^/P 
and if rj s > n put 6 U)S = c- v /nr?y 2 2~ 2S//n , where c = c(ki,p, 5). 

Theorem 5.2 For every K\, p > 2 and 5 < 1/2 — l/2(p — 1) there exist 
constants c\ % C2 and C3 that depend only on K\, p and 5 for which the following 
holds. There is an (rj s ) s>0 - admissible sequence of B^, for which, if u > c\, 
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then Pr(fii )U ) > 1 — exp(— C2n s ) and 

sup y~]O u MA s t,-)) < c 3 y/uy/n. 

tfc ^2 s >0 

Observe that by the p-small diameter assumption, M% < p n l / p i l / 2 ~ l / p . Also, 
since \x is unconditional, then for every I C {1, n} and v supported on /, 

lKOlk£Nk M iJi- ( 5 - 2 ) 

Indeed, by the unconditionality of /x, (xi, ...,x n ) has the same distribution 
as (s\Xi, e n x n ). Hence, for every r > 1 

||<<V>lk ~(E X E £ | ^a^D^ < (E x W 2 (J> 2 * 2 r/ 2 
<v^||v||^M m . 

We will also need a few ^ entropy estimates. Set B^ 2 = {v G M n : 
II (w, -/H^a — 1}' an d for K,L C W 1 denote by N(K, L) the minimal number 
of translates of L needed to cover K. 

Lemma 5.3 If I C {l,...,n} then for every e > 0, log N (B 2, sB^ 2 ) < 
M| 2 |/e 2 . Moreover, for e < 1, log iV(P£, eP^) < nlog(2/e). 

Proof. By the dual Sudakov inequality (see, e.g. |20j). if Py 11 is a unit 
ball of a norm on R J and G = (ffi)i6i is a standard Gaussian vector on 
K J , then log N(B$,eB\\ y) < (E||G||) 2 /e 2 . Since ||/||^ 2 < Eexp(/ 2 ) and 
(Sie/ ^f) 1 ^ 2 — ^l-TI a l mos t surely, then by changing the order of integration, 

E|| G/cM m <E x E G (exp((^ ffi x i ) 2 /c 2 M | 2 | )|X) < 2 

for a suitable absolute constant c, proving the first part. 

For the second part, note that N(B%, £B^ 2 ) < N{B2,B^, 2 )-N{B^, 2 ,eB^ 2 ). 
By the first part, logN^B^, B^, 2 ) < n, while a standard volumetric estimate 
shows that N(B^ 2 ,eB^ 2 ) < (5/e) n . ■ 

Next, let us define the sets T s . If 2 s+f>1 > n, let T s be a maximal 
e s separated subset of P?? relative to the ip2 norm and of cardinality 2 Va . 
If 2 S+Sl < n, let T s be a maximal e s separated subset of U 2 s+s 1 = {x G 
P2 : l su PP( :r )l < 2 S+Sl } with respect to the "02 norm, and of cardinality 2^. 
Given a vector f G P? , we wm define the functions tt s as follows. If 2 S+Sl > n, 



1 1 r 
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ir s t is a best ip2 approximation of t in T s . For 2 S+Sl < n one combines 
approximation and dimension reduction. Set s* to satisfy that 2 S * +Sl = n 
(and without loss of generality we will assume that such an integer exists). 
If v = ir Sm t, let I n /2 be the set of the largest n/2 coordinates of v, and put 
ir St -it to be the best approximation of the coordinate projection Pi n/2 v in 
T Sj ,„i, and so on. 

Lemma 5.4 There exists an absolute constant c such that for every t G B 2 , 
if s > (i.e., ifrj s > K^n), then ||(A s i, -\ ||^, 2 < c2~ 2S+B1 l n , and if < s < s* 
then ||<A s t,-)||^ 2 < c2~( s+Sl )/ 2 M 2a+ai . 

Proof. First consider s > s*. Note that || (A s t, ||^, 2 < \\{t — 7r s t,-^||^ 2 + 
|| (i — vr s „it, -)||^ 2 < e s + £ s -i, and by the covering numbers estimate from 
Lemma l5.3( in that range e s < 2~ 2a+ai / n . 

In the range s < s*, A s i = u + w, where w consists of the small- 
est 2 S+Sl_1 coordinates of Tr s t £ B 2 for some |/| = 2 S+S1 , and u is an 
e s _i-approximation of the largest 2 S+Sl_1 coordinates of ir s t. Therefore, 
||(A s t,-)||^ 2 < ||(w,-)||^ 2 + e s _i. Recall that for every such s, U 2 a+ Sl is a 
union of ( 2 s+ si ) b ans °f dimension 2 S+Sl , then 

log N{U 2 s+ Sl , eB i!2 ) <2 S+Sl log(en/2 s+Sl ) + max log N{Bi, eB^ 2 ) 

J|=2 s + S i 

<2 S+S - log(en/2 s+ ^) + M 2 a+a Je 2 . 

Note that for a suitable choice of K4, log \T S \ > 2-2 s+Sl log(en/2 s+Sl ). There- 
fore, e s < 2-^y 2 M 2S+sl , and applying \\(w,-)\\ f2 < M^M^ < 
2-( s +^)/ 2 M 2 s+ sl . m 

Proof of Theorem 15.21 Observe that ||^||^, 2 = Hy 2 )!^, and thus, by a 
standard application of Bernstein's inequality, for every integer m, 

Therefore, if w is large enough, then 

/u£ a \ 

~W ) " 2ex P(~ cu;2M ^ lo g( eAr / n ^)) - 2exp(-c 1 w 2 u£ s \og(eN/u£ s )). 
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Moreover, u£ s log(eN/u£ s ) < urj s , and for u > 1, u£ s log(eN/u£ s ) > r/ s , 
implying that with probability at least 1 — 2 exp(— C2U> 2 r/ s ), 

Also, with probability at least 1 — 2exp(— C2W 2 rj s ), if r] s > iV then 

1/2 



E^ 2 )* ^ 1/2 ii*ii^ 1/2 - 



Using Lemma 15.41 and summing the probability estimates, it is evident that 
with probability at least 1 — 2 exp(— C3W 2 r/o), the following holds: if ?] s > N 
then 

/ N \ 1 / 2 
SUp £« A ^> 2 )* < ^/^l/ 22 - 2 — /n ) 



if «4?i < i] s < N, then 



i2 N 



sup £«A s i,X;> )* £ ^ 1/2 r?y 2 2- 2S+sl /n, 



2 \i=l / 

and if s > and r/ s < K\n then 

/ui s \V2 
i 2 



sup E« A ^' *«> T < wuVWs /2 2~ {s+sl)/2 M 2 s +sl . 

Finally, since % = «42 Sl log(en/2 Sl ) then £q < 2 Sl . Moreover, |supp(7Tot)| < 
2 Sl and by (|5.2p . ||<7roi> 7IU2 — -^2 s i • Hence, with probability at least 
1 - 2exp(-c 4 w; 2 ?7o), 

/ue s \V2 

sup V«7r t, X 4 > 2 )* < wuV^Mv, . 
^ \tt J 

Since M e < p n 1 /^ 1 / 2 " 1 ^, thenPr(fii, u ) > 1-2 exp(-c 5 r ?0 ) = l-2exp(-c 5 2 Sl ) 
for the desired functionals 6 U ^ S . It remains to choose s\ and estimate X) s >o ® u ,s m 
Note that if 2 Sl ~ n 5 for 5 < 1/2 - l/2(p - 1), then 

~ vl /2 M 2S1 - Klip 2^ 2 \og l ' 2 (en/2^)n l l^ l l 2 - 1 ^ < c 6 ( Kl ,p,5)^. 

(5.3) 
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Also, 



£ s< £ r ?s 1 / 2 2-(^-)/ 2 M 2S+ , 1 

{s>0:r] a <K3n} {s:ri s <K3n} 



< KI ,p,s^ p £ 2( Si+s )( 1 /2-i/p) iog 1 /2 (en/2 ^ S i ) < C6 ( K1)P)< 5)^ 

{s:2 s+s i<n} 

(5.4) 



and 



E °s< Kl , K3 , P E 2( s+ ^)/ 2 2- 2S+sl /"< C6 ( Kl ,p,,5)^. (5.5) 

{s:rj a >K3n} {s:r/ a >K3n} 



Corollary 5.5 There exist absolute constants c\, C2 and C3 and C4 that de- 
pend on k±, K-2,p, d~, for which the following holds. If [i is as above and e > 0, 
then B2 has an (r] s ) s >o- admissible sequence (T s ) s >o for which, for u > c\/e, 
with probability at least 1 — 2exp(— C2£un) — 2 exp(— c^n 6 ), for every t G F>2 
and every I C {1, N}, 



1. ifVs<N, 

f £«A s t, X *)A ^ c ^ + c ^V^I|A s t||^ g , £ (|I|), 



and if i] s > N then 

1/2 



2. 



^E«A s t,x,» 2 ^ <c 4 e u , s . 



\ 1/2 

<vW™ + c~ly/u4> qtE (\i\). 



We will separate our treatment to the cases q > 4 and 2 < q < 4. First, if 
q > 4, let e = (q/A — l)/2 and note that c g>e > 1/2. Also, since ~ K2 
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IK*;')IUd J$«a 11(^711^25 then by the same computation as in (j5.3j) . ([5 
and ([53]) . 

£ 4 = sup ^r/y 2 ||A s t|| Lij < K i,k2,p,5 Vn. 



teB, 



2 s>0 



We thus have: 



Theorem 5.6 For every k\, K2, q > 4, p > 2 and (5 < 1/2 — l/2(p — 1), 
t/jere ezisi constants cq, c\, C2 and C3 which depend on k\, K2, p, q and 5, 
and an absolute constant C4 for which the following holds. If \i is as above, 
and N < exp(con <5 ) ; then for every u > c\, with pL N -probability at least 
1 — 2exp(— C2n s ), P a (B2) satisfies that 



sup 



1 N 



< C3ru 



n n 
N + iV 



with probability at least 1 — 2exp(— C4nr 2 ) relative to the Bernoulli random 
variables. 



Turning to the case 2 < q < 4, recall that forO < e < q/2 — 1, i?g j£ = 

2{aT7 S <jv} 'fa 2 ^ 1+£ ^ 9 ||^s' t; lli9- Assume that /x is as above and satisfies the 
p-small diameter assumption for p > q/(q/2 — 1). Then, for < e < 
q/2 - 1 - g/p (i.e. if 1 - (2(1 + e)/g) - 1/p > 0), 

B g ,e < Yl (2 s+Sl log(en/2 s+Sl )) 1 - 2 ( 1+e )/ 9 2-( s+Sl )/ 2 n 1 / p 2( s+Sl )( 1 / 2 - 1 / p ) 

{s:2 s+s l<n} 

, , w r ) 1 -2( 1 + e )/'? 

+ ^""^ 2(°~ L 

{s:2 s+s i>n} 



) (s+si)(l-2(l+e)/g)o-2("+ s i)/' 1 < 



l-2(l + e)/g-l/p' 



Therefore, one has 



Theorem 5.7 Let 2 < q < 4, p > (1-2/g)" 1 and < e < 2/q-l-q/p. If 
/1 and 5 are as above, u > l/e, and N < exp(con 5 ), then with p N probability 
at least 1 — 2exp(— c\eun) — 2exp(— C2n s ), -P^-B^) satisfies that 



sup 

teB" 



N 



i=l 



ra 



(l-2(l + e)/g) 2 \\N 



l-(2/q)-2e/q 



with probability at least 1 — 2exp(— c%nr 2 ) relative to the Bernoulli random 
variables. 
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In particular, taking e ~ 1/ log(eiV/n), then for every such N satisfying 
that N > KljK2 , q , p n, and any u > KljK2i9iP \og(eN/n), 



sup 



1 N 



N 
i=i 



, n\ 1-2/9 



5.2 Unconditional log-concave measures 

We will now present a different way of bounding VL\ )U (and £1%^ if needed) by 
estimating the moments of the increments A s h, and selecting the functionals 
9 U)S accordingly. 

For every s > and h £ H, set 

min{u£ 3+ i,N} m'm{ui 3o+ i,N} 

z 2 s (h)= ((A.fc) a (Xi)r, = e ((tt,^) 2 ^))*. 

i=l i=l 

In light of Theorem B, we will assume that is a bounded subset of 
(although what we do here can be extended to other moment assumptions), 
and thus one may control 0,2, u using <j>p for /3 = 1 and e which will be 
selected later. 

Lemma 5.8 There exist absolute constants C\, C2 and C3 for which the fol- 
lowing holds. For u > c\, with probability at least 1 — 2exp(— c\wq SQ ), for 
every s > s and every he H, Z s (h) < e\\Z s (h)\\ L2uVs+i . 

Proof. If Z is a nonnegative random variable then Pr(Z > e\\Z\\L ) < 
exp(— q). Thus, for a fixed s and every h G H, Z s (h) < e\\Z s (h)\\L 2uv 
with probability at least 1 — exp(— 2un s+ i). Since \og\A s H\ < n s+ i and 
because there are at most exp(u£ s+ i \og(eN/u£ s +i)) < exp(n7? s+ i) subsets of 
{1, N} of cardinality u£ s +i, the same probability estimate holds uniformly 
for every h 6 H (with a different constant). Summing the probabilities for 
every s > so and repeating the same argument for H SQ concludes the proof. 

■ 

Next, one has to control the moments appearing in Lemma 15.81 which is 
based on the following result, due to Latala [TT] . 

Theorem 5.9 Let X±, ...,X m be independent, distributed according to a 
nonnegative random variable X . Then for every p > 1, 

Y; j x i\h v ~ |~ ^— J : max{l,p/m} < r < pj . 



32 



Definition 5.10 If X is a random variable, for every p > 1 set 

ii vii \\ x h q 
\\ x \\(p) = sup — . 

The (p)-norms are a local version of the ip2 norm, and clearly ||X||( p ) < 
||X||^, 2 . Using those norms one may obtain a more compact expression for 
the required moments. 

Lemma 5.11 There exist an absolute constant c such that for every h 6 H, 
every s > so and every u > 0, 

1 /2 

\\ z s(h)\\L 2 1 < cy/urj s / +1 \\A s h\\ {2 u V3+1 ) 



and 



1 /2 

\Zs (h)\\ L2uVso+1 < cVuriJ 0+l \\nsM\(2u n30+1 )- 



Proof. Let Y { = h(Xi) and observe that for every m, \\{YT=i Y i) l/2 \\L p 



YULirfWl ■ Since m = ui s+ \ and p = 2wq s+ i then p/2 > m. Also, for 



ym y2||l/ 2 

every r <p, ||5^||i r < v^ll^ll^)! an d applying Theorem [57 

1/r 



E^h Vi <ii^ii W2)2 s„p^(=) 



< llv 2 ll p 

£ II* II (p/2) " 



y/\og(p/m) 
Hence, for our choice of p and m, 

m 

(IVY 2 !! 1 / 2 < ,/Vm 1/2 \\Y 2 \\ 1/2 - ,fTm l/2 \\Y\\ r 

II 2^ Yi Hip/2 ~ V^+lll^ ll( ur?s+1 ) - V*«? s+ lll y ll(2«r 7a+ i)- 

i=l 



Corollary 5.12 There exist absolute constants c\ and C2 for which the fol- 
lowing holds. If, for s > sq, 

UtS (A s h) = C 1 ^T}] / + 1 \\A 3 h\\(2ur ls + 1 ) 

and 

1 /2 

Ou,s (n so h) = civ^r? S()+ il|7r So /i|| {2m?so+1 ), 
then Pr(Qi tU ) > 1 — 2 exp(— C2ur/ S0 ). 

Next, assume that sq > 0, and thus one has to bound Pr(Qs u ). 
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Lemma 5.13 There exists absolute constants c\, C2 and C3 such that, for 
every u > c\, with probability at least 1 — 2exp(— C2«log N), for every < 
s < so and every h & H, 

( t \ V2 

and a similar bound holds for ir So h. 

Proof. Recall that for a fixed e > and every i, Pr(l^* > yj) < exp(— ei \og{eN/i)). 
Let e ~ u > 1 and observe that if Y € L^, then yj < u||y||^, 1 log(eiV/z) and 

Pr(3i <N :Y* > yi ) < exp(-ciu log N). (5.6) 

Since the cardinality of the set U s < So A S H is at most £ S<SQ 2"«+ 1 < iV C2 , 
(|5.6p holds uniformly with probability at least 1— exp(— C3U log iV) for u > C4. 
Therefore, on that event, for every < s < so an d every j, 

/ i X ^ 2 

( ^((A^) 2 J < c 5 u\\ A s h\\^ V7 log(eiV/i). 

/ ■ \ 1/2 

An identical argument holds for ( ^ =1 ((7r So /i) 2 (Xj))* j . ■ 

Therefore, the event £l\ tU U fi2,u U fi3 iU has high probability, leading to 
the following decomposition result. 

Corollary 5.14 There exist absolute constants c\ and C2 for which the fol- 
lowing holds. For every u > c\, with -probability at least 1—2 exp(— c^u log N), 
for every h € H and every I C {1, ...,N}, 

1. ifVs<N, 

fe(( A ^) 2 (^))*l £ v^7^|A s /i|| (2m?s+l) + u\\A s h\\^M\I\), 



and ifr] s > N then 

" ^sn\\(2ur) a+1 )- 



\ 1/2 

Y((A s hf(X t )r) < V^vlilWAs 
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(j> 2 Mr) < a E ^ 1 +iii A ^ii(2^ + x)+^x^i(i^i)+^o(^) ) 

ViGi" / s>s 

where R so (h) < v^ilKo^H^urn) if so = and R so (h) < ud^i(\I\) other- 



wise. 



Remark 5.15 Note that \\A s h\\^2 Ur)s+1 ) < ||A s /i||^ 2 , and thus one may take 

1 /2 

^«,s ~ Vuf] s +i\\^sh\\tp 2 ■ Ifrjo = and r] s = 2 s for s > 1, then for an almost 
optimal admissible sequence, 



S>SQ 



Although this estimate leads to an alternative proof of Theorem \4-l\ it is 
not sharp enough to prove Theorem B, as the latter requires more accurate 
bounds on ||A s /i||( 2u ^ +1 ). 



From here on we will assume that 7/0 = and that n s = 2 s for s > 1. If 

I(2uj7 s +i)- 



1 /2 

s > s ~ log AT", set 9 UiS (A s t) = ^/ur] s ' +1 \\A s h\\ { 



Theorem 5.16 There exist absolute constants c\ and c 2 for which the fol- 
lowing holds. If n is an isotropic, unconditional log -concave measure, Hnp — 
{(*)•) '■ t £ T} and (T s ) s >o is an admissible sequence of T , then for every 
u > C\, 

9 UtS ({A s t, •)) < c 2 u (2 s ||A s i||^™ + 2 s / 2 ||A s t|| J 

Proof. Let T C M. n , and identify it with the class of linear functionals 
H T = {<*,•) : t £ T} on (W n ,fi). By Borell's inequality [5], the and L 2 
norms are Ci-equivalent on ]R n , where c\ is an absolute constant, and since /x 
is isotropic, then \(t, -)\\l 2 = \\t\\i n - Moreover, there is an absolute constant 
C2 such that for every p > q and t € M n , 



[t,-)\\ Lp <cA(t,-) 



Hence, for every t £ W 1 and every r > 1, 



\L 



Vi 

<(P2VF+l)||<t,-)||(,)- 



|(V)||(r g ) < S U P 77— 1 + !!(*' ^Ife) " C2 SUP n L 7f _i + ^*'"^IW 

q<£<rq \t q<£<rq Q \t 
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Also, for any t G K n and any p > n, -)ll(p) — 2c2y f ||(*> ')ll(n)- There- 
fore, if u > 1 and rj s = 2 s < n then 

v^+ill < A fl t, •> || (2ut7fl+l) < u2 s / 2 1| < A,t, ■) ll^), 



and if 2 s > n then 



vW+ilK^t, •>ll(2ur ?s+1 ) ^ \/«— 1| <A s t, ■) || ( n ) . 

Note ([6] or [25], Proposition 3.4) that there is an isotropic convex body K 
such that for every t G R n and any 1 < p < n, \\ (t, •} \\l p (») < C3\\(t, -)\\l p (k)- 
Moreover, since [i is unconditional, K is also unconditional and using the 
Bobkov-Nazarov Theorem [7] we get that 

H(V)||£pOO < c 3 \\(t,-)\\L p (K) < C4||<i,-)|U P (^ 1 ) I 

where i^i is an isotropic image of Bf. 

The moments of every linear functional (t, •) relative to the volume mea- 
sure of an isotropic position of -Bf are well known |15| : namely, for 1 < p < n, 



1/2 



ll^^lU^-pllilU + v^ E (**)* 

\i=p+l 

Combining the two estimates, for p < n and any i G R n , 



/, \,| ^ lK*r)lli, 9 (Ki) 

9<P V ^ <Z<P 



l/2> 



V^II*IU+ ( £ (**)* 
i=g+l 



< \ZpPlkso + ll^k- 

Thus, for 2 s < n, 

2 s / 2 ||(A s t,->|| (2s) < 2 s ||A s t||^ +2 s / 2 ||A s t||^, 

and if 2 s > n, 



-^=||<A s t, •)!!(„) <2 s ||A s t|| £s 
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Note that for an almost optimal admissible sequence, 

^2 s ||A s t||,n + 2 s / 2 \\A s t\\ q <7i(T,C)+72(T,^). 



s>0 



It turns out that 71 (T,£^o) + 72(^,^2) can be completely characterized by 
the following beautiful result due to Talagrand |31| 152"] . 

Theorem 5.17 There exist absolute constants c and C for which the follow- 
ing holds. Let (yi)f =1 be independent, standard exponential variables. Then, 
for every T C M n , 

n n 

cEsupVyiti <7i(r,4o) + 72(T,£ 2 ) < CEsupVyA. 

Recall that if (2/1)^=1 are standard exponential random variables and T C 
]R n , then we denote E(T) = Esup tGT Y17=l an< ^ ^(T) = sup tgT ||t||2- 

Combining the estimates above, it follows that on Q± ufl^ u ^^3 w -fo-T 
satisfies Definition 13. II with S = 2 s \\ ■ ||^ + 2 s / 2 1| • |L» for s > so and 6* s = 

otherwise, 7 < £(T), ~ <£i, ||((^Q, i))i=ill = II (*> •) ll^i ~ 11*11^ and « ~ u - 
Therefore, B 4 < 72 (T, £ 2 ) < -E(T). 

Theorem 5.18 There exist absolute constants C\, c 2 , C3 and C4 /or which 
the following holds. For every u > c\, With fi N -probability at least 1 — 
2exp(— c 2 ulog N), the set V = P a T satisfies that 



sup 



N 
i=l 



< c 3 rV (d 2 (T)VNE(T) + (£(T)) 



with probability at least 1 — 2 exp(— C4r 2 ) u>ii/i respect to the Bernoulli random 
variables. 



5.3 Proofs of Theorems A and B 

The final step we need for the proofs of Theorem A and Theorem B is a 
version of the Gine-Zinn symmetrization Theorem (see, e.g. |14[I33|). which 
enables one to pass from the Bernoulli process indexed by random coordinate 
projections of a class of functions, to the empirical process indexed by the 
class. 
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Theorem 5.19 Let F be a class of functions and for every x > 0, set 



(3 N (x) = inf /eF Pr(\ E^i f{Xi) - E/| > x/2). Then 



/3 N (x)Pr x sup 



£/(Xi)-E/ 



i=l 



> x < 2Pr x ® e sup 



A? 



i=l 



> cc/4 



To apply Theorem 15.191 one has to identify the right value x for which 
Pn(x) > 1/2. In our case, F = H 2 , and thus one has to show that if x is 
large enoug h, then sup h£H Pr(\ h HXi) ~ E/i 2 | > x/2) < 1/2. 

Lemma 5.20 Let H be a class of functions which is bounded in L q and 
consider the empirical process indexed by F = {h? : h G H}. If q > 4 
and x > d\ VN then Pn{x) > 1/2 and the same holds if 2 < q < 4 and 



x 



>»d?r N 2 l q . 



Proof. The first part of the claim follows from an application of Cheby- 
shev's inequality, and is omitted. For the second part, fix r > 0, set 
V = (h 2 (Xi))^ =1 , and since Pr(\h 2 (X)\ > d\ q (rN/i) 2 / q ) < i/(rN) then 

Pr (y* > d 2 Lq (rN/i) 2 / q ^j < r*J ■ (i/rNY < exp(-i log(er)) = (er)~\ 

(5.7) 

Moreover, for r ~ c 2 , Pr(3i : \h 2 (Xi)\ > r d 2 L N 2 / q ) < 1/10. Hence, a 
truncation argument shows that without loss of generality we may assume 
that H/i^Iloo < rQd 2 Lq N 2 / q . Applying the L^ estimate for the largest two 
coordinates of V and (15. 7p for the rest, it follows that 

N 

Pr{\\V\\t? > ciiro + r)d 2 Lq N 2 ' q ) < £ r~* < r~ 2 . 

i=3 

Hence, under the truncation assumption, 

N N N 

E\ £ h 2 (X t ) - Eh 2 \ <E X E £ \ J2^ih 2 (Xi)\ < E x (£ h^X,)) 1 / 2 = E\\V\ 

i=l i=X 
< q r,d 2 N 2h i 



i=l 



(N 
2 



showing that it suffices to take x ~„ d 2 T N 2 / q as claimed. 
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Since x/N is well within our range, one may complete the proofs of 
Theorem A and Theorem B. 
Proof of Theorem A. For q > 4, let p r 

u >K U K 2 ,q ci, and r > c 2 . If 2 < q < 4 set ~ Kl , K2j(? ru(n/N) l ~ 2 l q for 
« ^kia2,9 log(eiV/n) and r > c 3 . Then, 

Pr x f sup |iV(U i } 2 -E(a} 2 |>fr, u ) 
\teS- 1 A ^ y 

<AE x Pr £ ^|l^ £4 <t,X 4 > 2 | > p r , u /4^ 

<Pr x ((Oi )U n0 2)U n^ 3>u ) c ) + Pr- e |-^e 4 (t,X 4 > 2 | >Pr,«/4 

V i=i 

<exp(-c 4 n). 



Proof of the quantitative Bai-Yin Theorem. 

To prove the quantitative version of the Bai-Yin Theorem one has to 
combine Theorem A with a conditioning argument. Consider the vector 
X = (£i, ...,£ n ) with £ L q for some q > 4, and let v be the measure on 
K given by v = X\cn l /PB™] that is, v is given by the conditioning of X to 
the unconditional body cn l l p Bp for a suitable choice of c and p. Clearly, v 
is unconditional and satisfies the p-small diameter L q moment assumption, 
and thus, falls within the realm of Theorem A. Therefore, if the event A = 
{maxj<Ar ll-Xjll^n < cn 1 / p } has high enough probability, the quantitative 
version of the Bai-Yin Theorem follows from Theorem A, because for every 
event £>, 

Pr^ti £B)< Pr{{Xi)? =l e B\X U ...,X N e cn^B;)Pr(A) + Pr(A c ). 

Hence, the final step in the proof of our version of the Bai-Yin Theorem is 
to show that if £ S L q for q > 2, there is some p > 2 for which A has a large 
measure. 

Recall that for every v E M. n , IMI^^ = maxj< n v^/k l / p , and since 1™ C 
£p,oo C ip for every r < p, it suffices to show that maxj<Ar HX^n^ < n x l p 
for some p > 2 with high enough probability. 

Lemma 5.21 For every q > 4 and 2 < p < q, there exist constants c\ 
and c 2 t/iai depend on q and p for which the following holds. If £ £ L g , 
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X = (£i,...,£n) and X±, ...,Xn are independent copies of X, then 
Pr( max ||^||^> Cl ||e|UX/H<^. 



l<i<N nP 



Proof. If A = \\£\\ Lq then Pr(\£\ > At) < t~ q , and for every 1 < k < n, 
Pr{C k >t)< (l)(Pr(\C\ > t)) k . Therefore, if p < q and y > e then 

Pr{i* k > A(ny/k) 1/p ) < exp(k\og(en/k) - k(q/p) \og{ny/k)) 
q 

< exp(— k( 1) logfny k)). 

P 

Using this estimate for every k = 2 J and summing the probabilities, it follows 
that for every q and p there is a constant c q , p for which H-X'Hfn^ < n 1 ^ with 
probability at least 1 — c qtP n l ~ q / p , and in particular, Pr(maxj<Ar ||Xj||^n > 
cn i/p) < C q t pN/n ( - q / p )~ 1 , as claimed. ■ 

Combining Lemma 15.211 with Theorem A concludes the proof of the 
quantitative Bai-Yin Theorem. ■ 

Proof of Theorem B. If r ~ u, with probability at least 1 — 2exp(— c^u 2 ) 
with respect to the Bernoulli random variables, 



sup 

vdP a T 



1 N 



N 

i=l 



< , m E{T K E2{T) 



Since d2(T)E(T)/V N is a "legal" choice in the Gine-Zinn symmetrization 
theorem, the proof is concluded. ■ 
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